Code
library(tidyr)
library(alr4)
Loading required package: car
Loading required package: carData
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)
library(ggplot2)
Lindsay Jones
October 31, 2022
Identify the predictor and the response.
Since we’re studying the dependence of fertility on ppgdp, the predictor is ppgdp and the response is fertility.
Draw the scatterplot of fertility
on the vertical axis versus ppgdp
on the horizontal axis and summarize the information in this graph. Does a straight-line mean function seem to be plausible for a summary of this graph?
The data appears curvilinear, so a straight-line function would be inaccurate.
Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won’t change, but the values on the axes will change.
The logarithm helps adjust the plots on the graph, so this model is much more plausible.
How, if at all, does the slope of the prediction equation change?
The slope of the equation increases by 1.33.
How, if at all, does the correlation change?
The correlation should not change because the ratio of the values is constant.
Draw the scatterplot matrix for these data and summarize the information available from these plots. (Hint: Use the pairs() function.)
There appears to be a strong positive correlation between stream runoff and precipitation at OPBPC, OPRC, and OPSLAKE, so you could potentially predict water supply near those sites. Correlation between the two at the other sites seems loosely positively correlated, if at all.
Create a scatterplot matrix of these five variables. Provide a brief description of the relationships between the five ratings.
Error in select(., quality, helpfulness, clarity, easiness, raterInterest): could not find function "select"
Error in pairs(rp): object 'rp' not found
Rater interest appears to have no correlation (or possibly a very weak positive correlation) with any other variable.
Quality has a strong positive correlation with helpfulness and clarity, a weak positive correlation with easiness.
Helpfulness has a strong positive correlation with clarity and a week positive correlation with easiness.
Clarity has a weak positive correlation with easiness (easiness has a weak positive correlation with every variable).
Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
response will be ignored
Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
factors
Call:
lm(formula = pi ~ re, data = ss)
Coefficients:
(Intercept) re.L re.Q re.C
3.5253 2.1864 0.1049 -0.6958
I could not make my code work for the categorical variables in this particular regression.
Call:
lm(formula = hi ~ tv, data = ss)
Residuals:
Min 1Q Median 3Q Max
-1.2583 -0.2456 0.0417 0.3368 0.7051
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.441353 0.085345 40.323 <2e-16 ***
tv -0.018305 0.008658 -2.114 0.0388 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4467 on 58 degrees of freedom
Multiple R-squared: 0.07156, Adjusted R-squared: 0.05555
F-statistic: 4.471 on 1 and 58 DF, p-value: 0.03879
The p-value and the plot both suggest that the negative correlation between hours spent watching TV and high school GPA is not strong. R-squared is not very close to 1, which also demonstrates the weakness of this relationship.
---
title: "Homework 3"
author: "Lindsay Jones"
description: The third homework
date: "10/31/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw3
---
# Homework 3
## Setup
```{r}
library(tidyr)
library(alr4)
library(smss)
library(ggplot2)
```
## Question 1
```{r}
data(UN11)
```
### 1.1
**Identify the predictor and the response.**
Since we're studying the dependence of fertility on ppgdp, the [predictor]{.underline} is **ppgdp** and the [response]{.underline} is **fertility**.
### 1.2
**Draw the scatterplot of `fertility` on the vertical axis versus `ppgdp` on the horizontal axis and summarize the information in this graph. Does a straight-line mean function seem to be plausible for a summary of this graph?**
```{r}
scatterplot(fertility ~ ppgdp, UN11)
```
The data appears curvilinear, so a straight-line function would be inaccurate.
### 1.3
**Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won't change, but the values on the axes will change.**
```{r}
scatterplot (log(fertility) ~ log(ppgdp), UN11)
```
The logarithm helps adjust the plots on the graph, so this model is much more plausible.
## Question 2
### 2.1
**How, if at all, does the slope of the prediction equation change?**
The slope of the equation increases by 1.33.
### 2.2
**How, if at all, does the correlation change?**
The correlation should not change because the ratio of the values is constant.
## Question 3
**Draw the scatterplot matrix for these data and summarize the information available from these plots. (Hint: Use the pairs() function.)**
```{r}
data(water)
pairs(water)
```
There appears to be a strong positive correlation between stream runoff and precipitation at OPBPC, OPRC, and OPSLAKE, so you could potentially predict water supply near those sites. Correlation between the two at the other sites seems loosely positively correlated, if at all.
## Question 4
**Create a scatterplot matrix of these five variables. Provide a brief description of the relationships between the five ratings.**
```{r}
data("Rateprof")
rp <- Rateprof %>%
select(quality, helpfulness, clarity, easiness, raterInterest)
pairs(rp)
```
- Rater interest appears to have no correlation (or possibly a very weak positive correlation) with any other variable.
- Quality has a strong positive correlation with helpfulness and clarity, a weak positive correlation with easiness.
- Helpfulness has a strong positive correlation with clarity and a week positive correlation with easiness.
- Clarity has a weak positive correlation with easiness (easiness has a weak positive correlation with every variable).
## Question 5
```{r}
data("student.survey")
ss <- student.survey
```
### 5.1
#### 5.1.a
```{r}
lm(pi ~ re, data = ss)
```
#### 5.1.b
I could not make my code work for the categorical variables in this particular regression.
### 5.2
#### 5.2.a
```{r}
fit2 <- lm(hi ~ tv, data = ss)
plot(hi ~ tv, data = ss)
abline(fit2)
```
#### 5.2.b
```{r}
summary(lm(hi ~ tv, data = ss))
```
The p-value and the plot both suggest that the negative correlation between hours spent watching TV and high school GPA is not strong. R-squared is not very close to 1, which also demonstrates the weakness of this relationship.